24 research outputs found

    HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions

    Get PDF
    Background: The modeling of interactions among transcription factors (TFs) and their respective target genes (TGs) into transcriptional regulatory networks is important for the complete understanding of regulation of biological processes. In the case of human TF-TG interactions, there is no database at present that explicitly provides such information even though many databases containing human TF-TG interaction data have been available. In an effort to provide researchers with a repository of TF-TG interactions from which such interactions can be directly extracted, we present here the Human Transcriptional Regulation Interactions database (HTRIdb).
Description: The HTRIdb is an open-access database of experimentally validated interactions among human TFs and their TGs. HTRIdb can be searched via a user-friendly web interface and the retrieved TF-TG interactions data and the associated protein-protein interactions can be downloaded or interactively visualized as a network using the Cytoscape Web software. Moreover, users can improve the database quality by uploading their own interactions and indicating inconsistencies in the data. So far, HTRIdb has been populated with 283 TFs that regulate 11886 genes, totaling 18160 TF-TG interactions. HTRIdb is freely available at http://www.lbbc.ibb.unesp.br/htri.
Conclusions: HTRIdb is a powerful user-friendly tool from which human experimentally validated TF-TG interactions can be easily extracted and used to construct transcriptional regulation interaction networks enabling researchers to decipher the regulation of biological processes

    Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes.</p> <p>Results</p> <p>We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality.</p> <p>Conclusion</p> <p>We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality.</p

    Prediction of Druggable Proteins Using Machine Learning and Systems Biology: A Mini-Review

    No full text
    The emergence of -omics technologies has allowed the collection of vast amounts of data on biological systems. Although the pace of such collection has been exponential, the impact of these data remains small on many critical biomedical applications such as drug development. Limited resources, high costs and low hit-to-lead ratio have led researchers to search for more cost effective methodologies. A possible alternative is to incorporate computational methods of potential drug target prediction early during drug discovery workflow. Computational methods based on systems approaches have the advantage of taking into account the global properties of a molecule not limited to its sequence, structure or function. Machine learning techniques are powerful tools that can extract relevant information from massive and noisy data sets. In recent years the scientific community has explored the combined power of these fields to propose increasingly accurate and low cost methods to propose interesting drug targets. In this mini-review, we describe promising approaches based on the simultaneous use of systems biology and machine learning to access gene and protein druggability. Moreover, we discuss the state-of-the-art of this emerging and interdisciplinary field, discussing data sources, algorithms and the performance of the different methodologies. Finally, we indicate interesting avenues of research and some remaining open challenges

    A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data

    No full text
    BACKGROUND: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products. RESULTS: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively. CONCLUSIONS: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability

    The Development of a Universal <i>In Silico</i> Predictor of Protein-Protein Interactions

    Get PDF
    <div><p>Protein-protein interactions (PPIs) are essential for understanding the function of biological systems and have been characterized using a vast array of experimental techniques. These techniques detect only a small proportion of all PPIs and are labor intensive and time consuming. Therefore, the development of computational methods capable of predicting PPIs accelerates the pace of discovery of new interactions. This paper reports a machine learning-based prediction model, the Universal <i>In Silico</i> Predictor of Protein-Protein Interactions (UNISPPI), which is a decision tree model that can reliably predict PPIs for all species (including proteins from parasite-host associations) using only 20 combinations of amino acids frequencies from interacting and non-interacting proteins as learning features. UNISPPI was able to correctly classify 79.4% and 72.6% of experimentally supported interactions and non-interacting protein pairs, respectively, from an independent test set. Moreover, UNISPPI suggests that the frequencies of the amino acids asparagine, cysteine and isoleucine are important features for distinguishing between interacting and non-interacting protein pairs. We envisage that UNISPPI can be a useful tool for prioritizing interactions for experimental validation.</p></div

    Descriptions of instances inserted in the test set.

    No full text
    <p>numb, numbers.</p>*<p>, indicates PPIs and no-PPIs within the same species.</p>**<p>, indicates PPIs and no-PPIs among different species.</p><p>-, numbers not showed.</p

    A general workflow of the procedures adopted in this work.

    No full text
    <p>A general workflow of the procedures adopted in this work.</p
    corecore